Sigma point policy iteration

نویسندگان

  • Michael H. Bowling
  • Alborz Geramifard
  • David Wingate
چکیده

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policyindependent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or replayed, meaning that for even moderate amounts of experience, SPPI is an order of magnitude faster than LSPI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Point-Based Policy Iteration

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of init...

متن کامل

New three-step iteration process and fixed point approximation in Banach spaces

‎In this paper we propose a new iteration process‎, ‎called the $K^{ast }$ iteration process‎, ‎for approximation of fixed‎ ‎points‎. ‎We show that our iteration process is faster than the existing well-known iteration processes using numerical examples‎. ‎Stability of the $K^{ast‎}‎$ iteration process is also discussed‎. ‎Finally we prove some weak and strong convergence theorems for Suzuki ge...

متن کامل

Solving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method

The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...

متن کامل

Policy Iteration in Finite Templates Domain

We prove in this paper that policy iteration can be generally defined in finite domain of templates using Lagrange duality. Such policy iteration algorithm converges to a fixed point when for very simple technique condition holds. This fixed point furnishes a safe over-approximation of the set of reachable values taken by the variables of a program. We prove also that policy iteration can be ea...

متن کامل

Combined Fixed Point and Policy Iteration for Hjb Equations in Finance

Implicit methods for Hamilton Jacobi Bellman (HJB) partial differential equations give rise to highly nonlinear discretized algebraic equations. The classic policy iteration approach may not be efficient in many circumstances. In this article, we derive sufficient conditions to ensure convergence of a combined fixed point-policy iteration scheme for solution of the discretized equations. Numeri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008